Here we replicate the paper’s work by bootstrapping the spectral density of the autoregressive process \[X_t = 0.5X_{t-1} - 0.6X_{t-2} + 0.3X_{t-3} - 0.4X_{t-4} + 0.2X_{t-5} + N_t\] using the given bootstrap algorithm:
set.seed(505)
x1 = arima.sim(n=256, list(ar=c(0.5, -0.6, 0.3, -0.4, 0.2)))
tsplot(x1, col=4)
mvspec(x1, col=4)
mvspec(x1, spans=c(2,2), col=4)
mvspec(x1, spans=c(2,2), log="yes", col=2)
spec = arma.spec(ar=c(0.5, -0.6, 0.3, -0.4, 0.2), main="Autoregression")
Usually, we can only create a confidence interval for our spectral
density by using mvspec with the log option.
The confidence interval is based on a Chi-square distribution with only
2 degress of freedom. Thus, the interval is too wide and of little use.
Therefore, in this work, we will try using a bootstrap approach to
create more accurate spectral density estimates.
The following step is to extract the size and frequency of the data.
xfreq = frequency(x1)
n = length(x1)
nspec = floor(n/2)
freq = seq(from = xfreq/n, by = xfreq/n, length = nspec) # extract the frequency
First step in the algorithm is centering the data by subtracting the sample mean. However, since we will compare with the true spectral density of the autoregression later, we will skip this step and compute the periodogram and spectral density of the true data.
The second step produces the initial estimate of the periodogram and
an estimated spectral density with a kernel \(K\) and a global bandwidth \(h\). The paper defines the periodogram to
be \[I(\omega) = \frac{1}{2\pi n} \Big|
\sum_{t=1}^n x_t e^{-it \omega} \Big|\] for \(\omega_j = 2\pi j/n, -n \leq j \leq n, -\pi \leq
\omega_j\leq \pi\).
The formula is, indeed, equivalent to our book’s definition of the
periodogram \[I(\omega_j) = |d(\omega_j)|^2 =
\frac{1}{n} \Big| \sum_{t=1}^n x_t e^{-2\pi i \omega_j t} \Big|\]
for \(j = 0,1,\dots, n-1\) since \(\omega = 2\pi j/n\), but in different
domains. Thus, we will the book’s code to compute the periodogram \(I\). Everything we does after this will be
in the frequency domain.
I = Mod(fft(x1)/sqrt(100))^2
I = I[1:floor(n/2)] # since I is symmetric, we will keep the first half
tsplot(freq, I, col=4)
Consequently, the paper chooses an inital bandwidth and a kernel to
calculate the estimated spectral density \[\hat{f}(\omega,h) = \frac{1}{nh} \sum_{k =-n}^n K
\Big(\frac{\omega-\omega_k}{h} \Big) I(\omega).\] where \(K(\cdot)\) is a given symmetric,
nonnegative function on the real values. It is noted that the inital
global bandwidth should not depend on \(\omega\). The paper chooses the Barlett
kernel with a global bandwidth of 0.05. In our exploration, we will work
with the Daniell kernel of spans
c(5,5). The next cell
gives a demonstration of the kernel weights and how we use the kernel to
get the smoothed spectral density.
(dm = kernel("modified.daniell", c(5,5))) # for a list
## mDaniell(5,5)
## coef[-10] = 0.0025
## coef[ -9] = 0.0100
## coef[ -8] = 0.0200
## coef[ -7] = 0.0300
## coef[ -6] = 0.0400
## coef[ -5] = 0.0500
## coef[ -4] = 0.0600
## coef[ -3] = 0.0700
## coef[ -2] = 0.0800
## coef[ -1] = 0.0900
## coef[ 0] = 0.0950
## coef[ 1] = 0.0900
## coef[ 2] = 0.0800
## coef[ 3] = 0.0700
## coef[ 4] = 0.0600
## coef[ 5] = 0.0500
## coef[ 6] = 0.0400
## coef[ 7] = 0.0300
## coef[ 8] = 0.0200
## coef[ 9] = 0.0100
## coef[ 10] = 0.0025
par(mfrow=1:2)
plot(kernel("daniell", 5), ylab=expression(h[~k]), col=4, panel.first=Grid(nxm=5))
plot(dm, ylab=expression(h[~k]), panel.first=Grid(), col=4) # for a plot
h = c(5,5) # initial bandwidth
kd = kernel("daniell", h)
f.hat = kernapply(I, kd, circular=TRUE)
tsplot(freq, f.hat, col=4)
Our next step is to estimate the residuals \[\hat{\epsilon} = \frac{I(\omega_k)}{\hat{f}(\omega_k,h_i)},\] for \(k = 1, \dots, n\). This results from the interpretation of the spectral estimator as the approximate multiplicative regression \[I(\omega_k) = \hat{f}(\omega_k) \cdot \epsilon_k.\] Then the paper recenters the data to avoid an additional bias in the resampling stage by rescaling the residuals \[\tilde{e}_k = \frac{\hat{\epsilon_k}}{\epsilon}, \text{ for } k=1, \dots, n, \text{ and } \epsilon = \frac{1}{n} \sum_{k=1}^n \hat{\epsilon}_k.\]
eps = I / f.hat
scaled_eps = eps / mean(eps)
mean(scaled_eps)
## [1] 1
The mean of the scaled residuals should be 1.
Now resample the scaled residuals and run boostrap estimate for a large number of times. The resampled periodogram is computed as \[I^*(\omega_k) = I^*(-\omega_k) = \hat{f}(w_k,g) \tilde{\epsilon_k}^*\] where \(g\) is a resampling bandwidth and \(\{\epsilon_k^*\}\) is a resample of the scaled residuals. We will pick \(g\) to be the same as the global bandwidth. Then the bootstrap kernel spectral density is calculated as \[\hat{f}(\omega,h,g) = \frac{1}{nh}\sum_{k=-n}^{n}K \Big(\frac{\omega-\omega_k}{h} \Big) I^*(\omega_k)\]
nboot = 1000
f.star.dist = c(c())
for (i in 1:nboot)
{
scaled_eps.star = sample(scaled_eps, replace=TRUE) # resample the scaled residuals
# new kernel
g = sample(1:5, 1)
kd.star = kernel("daniell", c(g,g))
# another estimated spectral density with the new kernel
f.hat = kernapply(I, kd.star, circular = TRUE)
# resampled periodogram
I.star = f.hat * scaled_eps.star
# bootstrap kernel spectral density
f.star = kernapply(I.star, kd, circular=TRUE)
f.star.dist[i] = list(f.star)
}
lwr = c()
upr = c()
for (i in 1:nspec)
{
f.star.wi = c()
for (j in 1:nboot)
{
f.star.wi[j] = f.star.dist[[j]][i]
}
# the 2.5 percentile spectral density at all frequencies
lwr[i] = quantile(f.star.wi, probs=c(.25, .75))[1]
# the 97.5 percentile spectral density at all frequencies
upr[i] = quantile(f.star.wi, probs=c(.25, .75))[2]
}
arma.spec(ar=c(0.5, -0.6, 0.3, -0.4, 0.2), main="Autoregression", ylim=c(0, max(upr,max(spec$spec))))
tsplot(freq, I, col=4)
lines(freq, upr, col=2)
lines(freq, lwr, col=2)
From the previous exploration, I build my first R package to bootstrap spectral density based on resampling the periodogram. You can find the complete package here: https://github.com/uyenle-gh/spec.boots
The following chunk installs the package into your environment.
Now we will apply our spec.boots function to produce a
confidence interval for the spectral density of the autoregression:
kd = kernel("daniell", c(5,5))
spec = spec.boots(x1, 1000, kd, demean=FALSE, alpha=0.95)
Most of the power is concentrated from frequency 0.13 to frequency 0.23, where the 95% confidence interval is above the median baseline. This allows us to conclude that the peak we observe around frequency 0.18 is different from the baseline and is statistically significant.
Another example where we know the exact frequencies is demonstrated below:
y1 = 2*cos(2*pi*1:100*6/100) + 3*sin(2*pi*1:100*6/100)
y2 = 4*cos(2*pi*1:100*10/100) + 5*sin(2*pi*1:100*10/100)
y3 = 6*cos(2*pi*1:100*40/100) + 7*sin(2*pi*1:100*40/100)
y = y1 + y2 + y3
y = ts(y)
spec.boots(y, 1000, kernel("daniell", c(2,2)))
## freq lwr upr
## 1 0.01 0.270223239 17.45207554
## 2 0.02 0.425149841 29.17483180
## 3 0.03 3.613353560 50.40524749
## 4 0.04 6.684144499 84.74913091
## 5 0.05 10.228294102 123.99912881
## 6 0.06 14.113791148 168.73174892
## 7 0.07 20.600388027 241.37377739
## 8 0.08 26.961871032 289.92339361
## 9 0.09 26.730406242 341.70273120
## 10 0.10 29.632080473 374.96964289
## 11 0.11 32.398298039 398.79006550
## 12 0.12 31.765245861 349.16376089
## 13 0.13 24.845023810 295.33638276
## 14 0.14 17.668298140 219.33958502
## 15 0.15 10.316856806 146.58849611
## 16 0.16 0.175864848 85.43525666
## 17 0.17 0.034660813 47.58141656
## 18 0.18 0.010259467 23.60354769
## 19 0.19 0.006406585 9.76800883
## 20 0.20 0.003795360 0.03046949
## 21 0.21 0.003587928 0.02883219
## 22 0.22 0.002863197 0.02676836
## 23 0.23 0.002688253 0.02466545
## 24 0.24 0.002506663 0.02304150
## 25 0.25 0.002309986 0.02119643
## 26 0.26 0.002289392 0.01988411
## 27 0.27 0.002019022 0.01818709
## 28 0.28 0.001860394 0.01632454
## 29 0.29 0.001854202 0.01600192
## 30 0.30 0.001750333 0.01493799
## 31 0.31 0.001617986 0.01457835
## 32 0.32 0.001594527 0.01409162
## 33 0.33 0.002667105 20.36772144
## 34 0.34 0.005072163 49.24809472
## 35 0.35 0.013794963 99.34851270
## 36 0.36 0.180623270 193.68520762
## 37 0.37 21.190621572 311.86491605
## 38 0.38 18.517373620 428.97680477
## 39 0.39 35.057150417 549.14836718
## 40 0.40 44.195429756 666.87758093
## 41 0.41 54.207926080 737.32322266
## 42 0.42 50.298037454 684.59572880
## 43 0.43 46.306856477 587.67380059
## 44 0.44 33.288276679 451.25448922
## 45 0.45 21.456315942 300.87576557
## 46 0.46 1.426250090 189.53943603
## 47 0.47 0.194874613 102.91956909
## 48 0.48 0.167327934 51.76333907
## 49 0.49 0.152880313 21.09192861
## 50 0.50 0.128516663 8.53530200
The spectral density plot of this example shows two peaks around frequency 0.1 and 0.4. The confidence interval corresponding to frequency 0.1 is slightly above the median baseline, so we suspect that this frequency only has a small effect.
In the next few chunks, we demonstrate how to use
spec.boots with real data and compare the result with those
produced by the mvspec command.
data("AirPassengers")
tsplot(AirPassengers, col=4, lwd=2)
mvspec(AirPassengers, col=4, lwd=2)
mvspec(AirPassengers, log='y', col=4, lwd=2)
mvspec(AirPassengers, log='y', spans=c(4,4), col=4, lwd=2)
spec.boots(AirPassengers, 1000, c(5,5))
## freq lwr upr
## 1 0.08333333 355.72568 4142.2920
## 2 0.16666667 420.54207 4576.5138
## 3 0.25000000 446.45384 4731.9612
## 4 0.33333333 422.74234 4440.5706
## 5 0.41666667 374.69175 3664.4454
## 6 0.50000000 370.13121 3068.4375
## 7 0.58333333 439.38391 3310.1799
## 8 0.66666667 603.41517 6251.2021
## 9 0.75000000 1010.98026 11819.4380
## 10 0.83333333 1672.14100 18909.4257
## 11 0.91666667 2470.73436 27376.6577
## 12 1.00000000 3079.55389 35076.7355
## 13 1.08333333 3237.30457 38125.7041
## 14 1.16666667 3126.18607 34922.3982
## 15 1.25000000 2541.43597 28261.3938
## 16 1.33333333 1819.00055 20618.4469
## 17 1.41666667 1149.11314 12820.0530
## 18 1.50000000 680.91683 7033.2147
## 19 1.58333333 470.55867 3831.4547
## 20 1.66666667 399.48993 3018.9933
## 21 1.75000000 473.26118 4337.7881
## 22 1.83333333 630.24126 6348.0246
## 23 1.91666667 851.76192 8489.9060
## 24 2.00000000 1096.04818 10263.8790
## 25 2.08333333 1064.68364 11140.4275
## 26 2.16666667 948.72250 10389.6724
## 27 2.25000000 676.99831 8208.0631
## 28 2.33333333 452.50726 5646.9596
## 29 2.41666667 261.12091 3266.7342
## 30 2.50000000 153.35273 1676.9989
## 31 2.58333333 89.47945 794.7784
## 32 2.66666667 69.14466 491.7707
## 33 2.75000000 69.27338 691.4979
## 34 2.83333333 95.04063 1011.3973
## 35 2.91666667 119.18149 1306.3992
## 36 3.00000000 140.21235 1499.2165
## 37 3.08333333 146.98432 1523.9207
## 38 3.16666667 133.40988 1400.2227
## 39 3.25000000 100.98750 1084.9823
## 40 3.33333333 70.90411 786.7359
## 41 3.41666667 43.53271 476.4370
## 42 3.50000000 31.06174 271.6371
## 43 3.58333333 28.12104 197.0744
## 44 3.66666667 33.69321 279.0043
## 45 3.75000000 48.24582 456.2280
## 46 3.83333333 67.65395 678.4795
## 47 3.91666667 90.14435 877.9652
## 48 4.00000000 111.81482 1113.2969
## 49 4.08333333 115.11076 1148.1218
## 50 4.16666667 104.20064 1066.9908
## 51 4.25000000 84.77034 877.5241
## 52 4.33333333 56.60093 665.5390
## 53 4.41666667 38.67698 434.6461
## 54 4.50000000 26.24454 254.1069
## 55 4.58333333 21.21035 157.6562
## 56 4.66666667 21.15284 174.4998
## 57 4.75000000 26.50933 271.4079
## 58 4.83333333 37.66147 424.9991
## 59 4.91666667 49.12580 557.1178
## 60 5.00000000 59.07278 665.9334
## 61 5.08333333 66.56840 710.2042
## 62 5.16666667 67.42700 700.9098
## 63 5.25000000 57.33906 621.9711
## 64 5.33333333 43.68773 484.2052
## 65 5.41666667 31.25137 300.4933
## 66 5.50000000 23.20855 186.3925
## 67 5.58333333 19.79099 133.5227
## 68 5.66666667 26.31766 238.2488
## 69 5.75000000 44.71610 579.7101
## 70 5.83333333 79.27307 1225.4516
## 71 5.91666667 148.83136 2084.3958
## 72 6.00000000 264.50198 3187.1431
For the AirPassengers dataset, it seems like both the
mvspec and spec.boots functions agree that
frequencies 1 and 2 are dominant. However, it is way more obvious
looking at the plot of spec.boots.
data(Inflation)
tsplot(Inflation$CPIPctDiff, col=4, lwd=2)
mvspec(Inflation$CPIPctDiff, col=4, lwd=2)
mvspec(Inflation$CPIPctDiff, col=4, lwd=2, log="y")
mvspec(Inflation$CPIPctDiff, col=4, lwd=2, log="y", spans=c(4,4))
kd = kernel("daniell", c(5,5))
spec.boots(ts(Inflation$CPIPctDiff) , 1000, kd)
## freq lwr upr
## 1 0.01041667 0.04689408 0.16034337
## 2 0.02083333 0.05096311 0.17774595
## 3 0.03125000 0.05652662 0.19706106
## 4 0.04166667 0.06206656 0.21246629
## 5 0.05208333 0.06663648 0.22782192
## 6 0.06250000 0.07400268 0.24538039
## 7 0.07291667 0.07630532 0.25816234
## 8 0.08333333 0.08003779 0.27015910
## 9 0.09375000 0.08309988 0.27978048
## 10 0.10416667 0.08486199 0.28719829
## 11 0.11458333 0.08589066 0.29140894
## 12 0.12500000 0.08535325 0.29285489
## 13 0.13541667 0.08409070 0.28472460
## 14 0.14583333 0.08288236 0.28962627
## 15 0.15625000 0.08092890 0.28004080
## 16 0.16666667 0.07730652 0.27240448
## 17 0.17708333 0.07334972 0.25616363
## 18 0.18750000 0.06993991 0.23825446
## 19 0.19791667 0.06530931 0.22159049
## 20 0.20833333 0.06127102 0.20079512
## 21 0.21875000 0.05714619 0.18851591
## 22 0.22916667 0.05298595 0.17638655
## 23 0.23958333 0.04782545 0.16248253
## 24 0.25000000 0.04432116 0.14670200
## 25 0.26041667 0.03973045 0.13551853
## 26 0.27083333 0.03631679 0.11971745
## 27 0.28125000 0.03261030 0.10677109
## 28 0.29166667 0.02990924 0.09433415
## 29 0.30208333 0.02735213 0.08360982
## 30 0.31250000 0.02451812 0.07349763
## 31 0.32291667 0.02169980 0.06683070
## 32 0.33333333 0.02037498 0.06154113
## 33 0.34375000 0.01940105 0.05533246
## 34 0.35416667 0.01847060 0.05282994
## 35 0.36458333 0.01800448 0.05127177
## 36 0.37500000 0.01742394 0.05231938
## 37 0.38541667 0.01752132 0.05288002
## 38 0.39583333 0.01759420 0.05403791
## 39 0.40625000 0.01831707 0.05653643
## 40 0.41666667 0.01995307 0.06058135
## 41 0.42708333 0.02096501 0.06471582
## 42 0.43750000 0.02321579 0.06923398
## 43 0.44791667 0.02560699 0.07686749
## 44 0.45833333 0.02791782 0.08584259
## 45 0.46875000 0.03119321 0.09813570
## 46 0.47916667 0.03505789 0.11217101
## 47 0.48958333 0.03859909 0.12435642
## 48 0.50000000 0.04287094 0.14097112
The spectral density of the CPI returns peaks around frequency 0.1
for both datasets. Here, it is easier to tell which frequencies matter
since we have a baseline to rely on. One drawback to this approach, as
you can also tell from the two plots of spec.boots and
mvspec, is that some of the peaks are flattened and spread
out to others due to averaging.